Introduction

This project sets out to explore the data published by the World Bank at https://data.worldbank.org/1, which contains the 2017 values of a selection of ‘global development indicators’ for 68 countries. Each observation in this dataset corresponds to a single country, and most of the variables correspond to a development indicator. In particular, the data contains 14 variables:

Exploratory data analysis

Structure

Our dataset has 68 observations and 14 variables as mentioned above. The first three variables are characters while the rest of the variables are continuous numeric values.

#High-level information
kbl(dev.inc) %>%
  kable_styling() %>%
  scroll_box(width = "100%", height = "500px")
country region income GDP.percap Market.Cap.pcntGDP Unemployment.female Unemployment.male Education.Expend Arable.Land.pcnt Life.Expect.female Life.Expect.male Mortality.u5 CO2.emiss.mtpercap Access2Elec.pcnt
United Arab Emirates Middle East & North Africa High income 40644.791 60.1920579 7.111 1.472000 NA 0.6265841 79.008 76.966 7.2 21.9506143 100.00000
Argentina Latin America & Caribbean Upper middle income 14613.042 16.8948426 9.473 7.528000 5.45432 14.3238730 79.726 72.924 10.3 4.0894715 100.00000
Austria Europe & Central Asia High income 47312.006 36.1929246 5.032 5.912000 5.37189 16.1044595 84.000 79.400 3.6 7.4827515 100.00000
Australia East Asia & Pacific High income 53934.250 113.6846991 5.670 5.520000 5.12790 3.9979095 84.600 80.500 3.8 15.7386474 100.00000
Barbados Latin America & Caribbean High income 17391.669 67.3145842 8.778 8.468000 4.36638 16.2790698 80.307 77.562 13.4 4.1924473 100.00000
Bangladesh South Asia Lower middle income 1563.768 34.5114059 6.710 3.335000 NA 59.5938388 73.978 70.409 34.2 0.4929066 88.00000
Belgium Europe & Central Asia High income 44089.310 87.2928335 7.058 7.118000 6.42534 27.6089828 83.900 79.200 4.1 8.1554911 100.00000
Bahrain Middle East & North Africa High income 23742.937 61.1893688 3.837 0.483000 2.32452 2.0512821 78.143 76.188 7.3 20.3336240 100.00000
Brazil Latin America & Caribbean Upper middle income 9928.676 46.2664552 14.718 11.369000 6.32048 6.6715800 79.156 71.804 15.4 2.1644215 99.80000
Canada North America High income 45129.356 143.5220535 5.851 6.781000 NA 4.2951998 84.000 79.900 5.2 15.3852910 100.00000
Switzerland Europe & Central Asia High income 83352.089 239.3965223 5.063 4.572000 4.95157 10.0765183 85.600 81.600 4.2 4.5694192 100.00000
Cote d’Ivoire Sub-Saharan Africa Lower middle income 2111.027 24.2026471 3.870 2.860000 3.80348 11.0062893 58.320 55.864 86.0 0.4247575 65.60000
Chile Latin America & Caribbean High income 14998.817 106.3678182 7.462 6.602000 5.41966 1.7242029 82.333 77.333 7.4 4.7123958 99.70000
China East Asia & Pacific Upper middle income 8816.987 70.7634243 3.853 4.913000 3.66745 12.6785033 78.828 74.315 9.2 7.1749480 100.00000
Colombia Latin America & Caribbean Upper middle income 6376.707 38.9495245 11.502 6.865000 4.53551 5.4249662 79.694 74.124 14.6 1.5551062 98.50000
Costa Rica Latin America & Caribbean Upper middle income 12225.574 4.9751435 10.290 6.820000 7.06981 4.9255778 82.557 77.354 8.7 1.6525403 99.60000
Cyprus Europe & Central Asia High income 26608.875 12.3405769 11.265 10.865000 5.75417 10.2088745 82.794 78.547 2.8 6.1880926 100.00000
Germany Europe & Central Asia High income 44542.295 61.4445107 3.317 4.129000 4.88274 33.6949366 83.400 78.700 3.9 8.8582937 100.00000
Algeria Middle East & North Africa Lower middle income 4109.698 0.2076133 18.415 8.315000 6.50538 3.1367000 77.735 75.307 24.3 3.5057477 99.61514
Egypt, Arab Rep.  Middle East & North Africa Lower middle income 2444.290 19.7452044 22.749 8.127000 NA 2.9242742 73.967 69.453 21.7 2.4749439 100.00000
Spain Europe & Central Asia High income 28100.586 67.8866149 19.022 15.654000 4.20778 24.5297408 86.100 80.600 3.3 5.6540396 100.00000
France Europe & Central Asia High income 38685.258 106.2027715 9.374 9.444000 5.45160 33.7213605 85.700 79.600 4.3 4.7275756 100.00000
Greece Europe & Central Asia High income 18536.191 25.3849319 26.117 17.851999 3.47221 16.5865012 83.900 78.800 4.3 6.2112500 100.00000
Croatia Europe & Central Asia High income 13629.290 40.4963655 11.908 10.608000 3.90567 14.4371797 80.900 74.900 4.8 4.2380576 100.00000
Hungary Europe & Central Asia High income 14623.697 22.0446330 4.573 3.813000 4.61973 47.3701512 79.300 72.500 4.4 4.7558400 100.00000
Indonesia East Asia & Pacific Lower middle income 3837.578 51.2679274 3.599 4.059000 2.66998 14.0078476 73.515 69.156 25.6 2.0136711 98.14000
Ireland Europe & Central Asia High income 69601.684 43.7994987 6.287 7.066000 3.46885 6.6918276 84.000 80.400 3.4 7.8067341 100.00000
Israel Middle East & North Africa High income 40774.130 65.0333725 4.326 4.125000 6.10273 17.8835490 84.600 80.600 3.8 7.5792180 100.00000
India South Asia Lower middle income 1980.667 96.3988262 5.357 5.359000 NA 52.6088141 70.425 68.000 38.6 1.7191902 92.45683
Iran, Islamic Rep.  Middle East & North Africa Lower middle income 5520.315 23.8756332 19.938 10.339000 3.79040 9.0172892 77.436 75.217 14.4 7.6949310 99.94000
Jamaica Latin America & Caribbean Upper middle income 5070.100 63.5079773 15.382 8.426000 5.26017 11.0803324 75.878 72.708 14.6 2.4616139 97.62187
Jordan Middle East & North Africa Upper middle income 4231.518 57.8825440 27.152 15.762000 3.22854 2.1063303 76.052 72.628 16.4 2.6671190 100.00000
Japan East Asia & Pacific High income 38891.086 126.2021994 2.635 2.927000 NA 11.4156379 87.260 81.090 2.6 9.0856391 100.00000
Korea, Rep.  East Asia & Pacific High income 31616.843 109.1056282 3.476 3.775000 4.32824 14.3236591 85.700 79.700 3.3 12.1753647 100.00000
Kazakhstan Europe & Central Asia Upper middle income 9247.581 27.3121497 5.435 4.404000 2.75082 10.9853317 76.920 68.720 10.5 12.5048676 100.00000
Lebanon Middle East & North Africa Upper middle income 7819.605 21.5505164 13.806 9.113000 2.13294 12.9032258 80.786 77.031 7.7 4.2936499 100.00000
Sri Lanka South Asia Lower middle income 4077.044 21.6858354 6.305 2.819000 2.79925 21.3303605 79.979 73.238 7.8 1.0870173 97.50000
Luxembourg Europe & Central Asia High income 109921.031 104.7127658 5.498 5.538000 3.56959 25.5300412 84.400 79.900 2.8 15.0921628 100.00000
Morocco Middle East & North Africa Lower middle income 3035.454 61.1294788 10.683 8.737001 NA 16.7546493 77.438 74.948 21.2 1.8529418 100.00000
Malta Middle East & North Africa High income 28250.698 39.1125734 4.249 3.829000 4.65163 28.3437500 84.600 80.200 6.6 3.2478702 100.00000
Mauritius Sub-Saharan Africa Upper middle income 10484.908 73.4796122 10.062 4.650000 5.02313 36.9458128 77.890 71.300 14.9 3.3053590 99.61000
Mexico Latin America & Caribbean Upper middle income 9287.850 35.9837630 3.602 3.312000 4.51822 12.2971270 77.827 72.046 15.2 3.7812158 100.00000
Malaysia East Asia & Pacific Upper middle income 10259.305 142.8251821 3.825 3.151000 4.67531 2.5140770 78.008 73.903 8.3 7.1658085 100.00000
Namibia Sub-Saharan Africa Upper middle income 5367.115 22.6071805 21.810 21.476000 9.75998 0.9717111 65.823 60.020 44.9 1.8021970 52.50000
Nigeria Sub-Saharan Africa Lower middle income 1968.565 9.9049820 9.257 7.679000 NA 37.3310496 54.843 53.086 122.5 0.5915968 54.40000
Netherlands Europe & Central Asia High income 48554.992 132.2544227 5.253 4.482000 5.17510 30.7989308 83.400 80.200 4.0 9.1008876 100.00000
Norway Europe & Central Asia High income 75496.754 72.0874893 3.696 4.588000 7.91198 2.1951235 84.300 81.000 2.5 6.9926518 100.00000
New Zealand East Asia & Pacific High income 42992.895 45.7554990 5.239 4.289000 6.25974 1.8647222 83.400 80.000 5.2 6.8389563 100.00000
Oman Middle East & North Africa High income 17329.185 26.3415163 10.831 1.363000 5.84019 0.2239095 79.906 75.645 11.2 14.9809491 100.00000
Panama Latin America & Caribbean Upper middle income 15146.409 24.1528164 5.092 3.045000 2.88224 7.6227739 81.412 75.060 15.9 2.4958824 93.70000
Peru Latin America & Caribbean Upper middle income 6710.508 47.0211237 3.907 3.507000 3.93131 2.7250000 79.031 73.612 14.4 1.7170044 94.80000
Papua New Guinea East Asia & Pacific Lower middle income 2695.249 7.3939611 1.508 3.447000 1.96493 0.6624564 65.326 62.784 48.2 0.9125344 54.40000
Philippines East Asia & Pacific Lower middle income 3123.246 88.4074079 2.698 2.458000 4.40000 18.7476943 75.268 66.971 28.5 1.2987183 93.00000
Poland Europe & Central Asia High income 13864.682 38.2506162 4.912 4.872000 4.55846 35.6216728 81.800 73.900 4.6 8.2375624 100.00000
Portugal Europe & Central Asia High income 21437.348 34.2327108 9.350 8.410000 5.01561 10.3051495 84.600 78.400 3.6 5.1775191 100.00000
Qatar Middle East & North Africa High income 59124.867 81.0743462 0.639 0.062000 2.96746 1.2184508 81.743 78.830 7.0 32.1793706 100.00000
Romania Europe & Central Asia Upper middle income 10807.009 11.1580967 4.045 5.606000 3.09539 37.1305633 79.100 71.700 7.9 3.7827902 100.00000
Russian Federation Europe & Central Asia Upper middle income 10720.333 39.6026644 5.050 5.360000 4.68991 7.4280983 77.640 67.510 6.9 10.7766446 100.00000
Saudi Arabia Middle East & North Africa High income 20802.466 65.5515331 21.253 3.213000 NA 1.5988352 76.487 73.671 8.1 16.3347636 99.93000
Singapore East Asia & Pacific High income 61176.456 229.2947188 4.436 4.032000 2.76826 0.7898449 85.400 80.900 2.6 8.4511514 100.00000
Slovenia Europe & Central Asia High income 23455.945 13.0358151 7.468 5.769000 4.78078 9.1369096 84.000 78.200 2.4 6.8428582 100.00000
Thailand East Asia & Pacific Upper middle income 6593.818 120.2557332 0.841 0.821000 3.35573 32.9033647 80.468 72.977 9.9 3.7662287 99.90000
Tunisia Middle East & North Africa Lower middle income 3687.777 21.1616433 22.609 12.378000 NA 16.7803811 78.350 74.297 17.1 2.6142618 100.00000
Turkey Europe & Central Asia Upper middle income 10589.668 26.4857753 13.848 9.342000 NA 25.9839143 80.088 74.149 11.4 5.1271967 100.00000
Ukraine Europe & Central Asia Lower middle income 2638.326 4.6373230 7.737 11.153000 5.41226 56.5751769 76.780 67.020 8.9 3.8999682 100.00000
United States North America High income 60109.656 164.3592942 4.312 4.402000 NA 17.2438567 81.100 76.100 6.6 14.8058824 100.00000
Vietnam East Asia & Pacific Lower middle income 2365.522 55.9970069 1.698 2.027000 4.08554 22.5378140 79.366 71.124 21.4 2.3480813 100.00000
South Africa Sub-Saharan Africa Upper middle income 6690.940 322.7109753 29.283 25.219999 6.11306 9.8920937 67.064 60.162 34.6 7.6327294 84.40000
The variables region and income are categorical and they take only predefined values. Those variables are saved as characters and hence we convert them to factors. We observe that region has 7 levels and income has 3. However the variable income is in random order and in order to facilitate our analysis we reorder the levels.
x
East Asia & Pacific
Europe & Central Asia
Latin America & Caribbean
Middle East & North Africa
North America
South Asia
Sub-Saharan Africa
x
High income
Lower middle income
Upper middle income

Qualtity of Data

The data set, as previously stated, contains 68 observations and 14 variables. The first variable specifies the country to which the observation belongs. The second and third variables are categorical, and they describe each country’s region and income category as classified by the World Bank in 2017. This section examines the data for missing values, and/or outliers.

Missing Values

This section sets out to investigate the existence of missing values and identify any features that contain a lot of them. In Figure 1, the right graph shows the proportion and position of missing values for each feature in black. These values account for 1.3 percent of the total and are all in the Education.Expend. There are 12 missing observations out of 68 for this specific feature, accounting for 18% of the total. At first glance, there doesn’t appear to be any pattern in the missingness. We plot the variable Education.Expend against categorical variables income and region, as well as continuous variables, GDP.precap and Market.Cap.pcntGDP, to identify the type of missingness in the data and visualize any existing relationships.

vis_miss(dev.inc)
\label{fig:fig1}Missing Values

Missing Values

sum(is.na(dev.inc))
## [1] 12

The figure below shows the percentage of missing values per region and income category. Looking at the first row we observe that 100% of the data is missing from North America and around 75% from South Asia and 25% from Middle East and North Africa. Regarding the second row, we observe that 40% of the data is missing from lower income countries and less than 20% from High income countries.

\label{fig:fig2} Income Categories

Income Categories

\label{fig:fig2} Income Categories

Income Categories

The graph below shows the variable Education.Expend plotted againts GDP.precap and Market.Cap.pcntGDP by region and income.The missing values are shown as red dots near the bottom of each panel. We can see that the missing values exist across the whole range in both of the graphs and the distribution is similar to the one of the none missing observations .In particular, in the first graph it can be seen that some values are clustered in the lower range of GDP. In the second graph, the missing values are uniformly distributed acrros the x-axis.

\label{fig:fig3} Missing Values

Missing Values

\label{fig:fig3} Missing Values

Missing Values

The table below illustrates the missing values grouped per region. The first column shows the region, the second column shows the missing variable the third column show the number of missing values and the last column shows the percentages of missing values per region. We can see that there isn’t any data about education expenditure for North America, while 1/3 of the data is missing for South Asia. In addition, 36% and 20 % of values are missing for the Middle East & North Africa and Sub-Saharan Africa respectfully.
region variable n_miss pct_miss
Middle East & North Africa Education.Expend 5 35.714286
Latin America & Caribbean Education.Expend 0 0.000000
Europe & Central Asia Education.Expend 1 4.545454
East Asia & Pacific Education.Expend 1 8.333333
South Asia Education.Expend 2 66.666667
North America Education.Expend 2 100.000000
Sub-Saharan Africa Education.Expend 1 20.000000

Similarly, the table below shows the missing values grouped per income category.The table is similar to the above with the difference that the first column shows the income. We observe that approximately the same number of data is missing from low income and high income countries, however the number of countries in the low income category is smaller than the high income and as a result the proportion of missing values is bigger.

income variable n_miss pct_miss
High income Education.Expend 5 15.15152
Upper middle income Education.Expend 1 5.00000
Lower middle income Education.Expend 6 40.00000

Those observations indicate that MCAR (Missing completely at random) does not apply to this variable, and we would need to consider the missing data to be either MAR or MNAR. MAR (Missing at random) assumes that we can predict the missing value based on the rest of the data.By conditioning on income, we can find that the resulting univariate distributions (for each of the other variables) appear similar for those observations for which ed.exp is missing as for those for which it is not. Hence, missingness appears to be explained by an observed variable (income), and not an unobserved variable. This suggests MAR more appropriate than MNAR.

We chose to drop the column Education Expenditure, as we lack all data for North America and a significant proportion of South Asia.

#Select Education.Expend and remove rows containing NA
ed.exp<-dev.inc %>%select(country,Education.Expend) %>% na.omit
#Remove Education Expenditure
dev.inc<-dev.inc %>%select(-Education.Expend)
#Select Numeric Data
num.data<-dev.inc %>% 
  select_if(is.numeric)

Outliers and Errors

Next, we move on finding outliers.The below figure shows the box plots for each numeric variable in the data set. The red points are the outliers. We can see that all the variables have some observations that deviate from the rest, but they are not out of bounds (for example negative or above a normal range) and they make logical sense. This is expected as those variables describe unique development indicators in which some countries perform worse or better.

Note: Boxplots aren’t always appropriate for detecting outliers, particularly if the variables distribution is higly skewed.If we had noticed some anomalous data we would be worth to choose to investigate further by visualizing histograms and parfoming the IQR method.
\label{fig:fig4} Boxplots

Boxplots

Exploration of the univariate and multivariate distribution

Up until now we investigate the structure and quality of the data. The following section presents a brief exploration of the univariate and multivariate distribution of the data.

Univariate Analysis

For the univariate analysis we’re to investigate the distribution of the categorical and numerical variables. The bellow figure shows the number of variables in each category on the left plot, the violin plots of the numerical variables on the centre plot, and the histogram of education expenditure on the right plot. We notice that a lot of countries are classified as high income, followed by upper middle income. and the majority are located in Europe and South Asia and the Middle East and North Africa.

\label{fig:fig5} Distribution of categorical variables

Distribution of categorical variables

The next table shows the summary statistics of each variable. It’s interesting to note that for the variable GPD per capital, the difference between the mean and the median is quite significant. This caused by the countries with the highest GDPs.

##    GDP.percap     Market.Cap.pcntGDP Unemployment.female Unemployment.male
##  Min.   :  1564   Min.   :  0.2076   Min.   : 0.639      Min.   : 0.062   
##  1st Qu.:  5482   1st Qu.: 24.1902   1st Qu.: 4.198      1st Qu.: 3.708   
##  Median : 13747   Median : 46.6438   Median : 6.069      Median : 5.359   
##  Mean   : 22719   Mean   : 64.3540   Mean   : 8.844      Mean   : 6.556   
##  3rd Qu.: 38737   3rd Qu.: 82.6290   3rd Qu.:10.940      3rd Qu.: 8.414   
##  Max.   :109921   Max.   :322.7110   Max.   :29.283      Max.   :25.220   
##  Arable.Land.pcnt  Life.Expect.female Life.Expect.male  Mortality.u5    
##  Min.   : 0.2239   Min.   :54.84      Min.   :53.09    Min.   :  2.400  
##  1st Qu.: 4.2209   1st Qu.:77.59      1st Qu.:71.99    1st Qu.:  4.275  
##  Median :11.8564   Median :79.94      Median :75.00    Median :  8.000  
##  Mean   :15.8681   Mean   :79.23      Mean   :74.29    Mean   : 14.359  
##  3rd Qu.:23.0358   3rd Qu.:83.90      3rd Qu.:78.92    3rd Qu.: 15.250  
##  Max.   :59.5938   Max.   :87.26      Max.   :81.60    Max.   :122.500  
##  CO2.emiss.mtpercap Access2Elec.pcnt
##  Min.   : 0.4248    Min.   : 52.50  
##  1st Qu.: 2.4907    1st Qu.: 99.68  
##  Median : 4.7417    Median :100.00  
##  Mean   : 6.6313    Mean   : 96.52  
##  3rd Qu.: 8.1760    3rd Qu.:100.00  
##  Max.   :32.1794    Max.   :100.00
Next,the figure below shows the violin plots for the numeric variables. Except for the variables describing life expectancy, all of the numeric variable distributions are positively skewed. Life expectancy appears to have a bimodal distribution, as evidenced by the presence of two peaks. Most of the observations for the variable Access to Electricity Access to Electricity are gathered around 100%, with a portion of them ranging between 50% and 100%. Finally, the distribution of Education Expenditure is nearly normal, with two outliers on the right.
\label{fig:fig6} Distribution of numerical variables

Distribution of numerical variables

Finally we’re going to look into the distribution of Education Expenditure.%. The distribution is close to normal, and there are two outliers on the right.
\label{fig:fig6} Distribution of numerical variables

Distribution of numerical variables

Multivariate Analysis

The figure below is visual representation of the multivariate correlation structure and their significance levels. The heatmap shows the pairwise Pearson correlation coefficients between the variables. The variables which do not have a significant correlation are left blank. We observe that:

  • Mortality has a very high negative correlation with access to electricity and lives expectancy of both genders.
  • Arable Land is not strongly correlated with any other variable.
  • Unemployment rate for males and females are strongly correlated with each other.
  • CO2 consumption has a moderate positive correlation with GDP per cap while GDP per capita has a moderate positive correlation with life expectancy.
  • Live expectancy for both genders are very highly positively correlated with each other and access to electricity.
\label{fig:fig8} Distribution of numerical variables

Distribution of numerical variables

Based on the observations above, we are going to investigate the relationship between the variables that display strong correlations. In particular, we’re going to investigate:

  • Income Categories and the relationship with GDP per capita (current USD) and market capitalization of domestic listed companies (% of GDP).Moreover we’re going to look at the number of countries in each income category for each section and display the maps that show the GDP per capita and Market Cap globally for each country.
  • CO2 consumption per capita and the relationship with GDP per capita and income categories.
  • Unemployment rate by gender for each country and region. In addition we’re going to explore how it’s related to income categories and GDP per capita.
  • Life expectancy by gender for each country and region. In addition we’re going to explore how it’s related to GDP per capita and Market Cap.
  • Mortality and the relationship with female life expectancy and access to electricity.
  • Percentage of access to electicity globally for each country.
  • Arable land globally for each country.
  • Education Expenditure globally for each country.

Income Categories

First, we are going to explore the relationship between income categories, GDP per capita (current USD) and market capitalization of domestic listed companies (% of GDP).
income maxGDP minGDP maxMarketCap minMarketCap
Lower middle income 5520.315 1563.768 96.39883 0.2076133
Upper middle income 15146.409 4231.518 322.71098 4.9751435
High income 109921.031 13629.290 239.39652 12.3405769
The figure below displays the relationship between GDP per capita and Market cap for each income category.We observe the relationship is linear and because of South Africa’s Market Cap the trending line is curved at the end.
\label{fig:fig9} Number of countries in Income Categories

Number of countries in Income Categories

The figure below provides a breakdown of the number of counties in each category for each region. We observe that all countries are categorized as high income in North America and lower middle income in South Asia.It can be seen that there is there is no country in the high income category in Sub-Saharan Africa and in contrast there is no country categorized as lower middle income in Latin America & Caribbean and North America.Finally, in all the other regions there are countries categorized in all income levels.The majority of the countries in Europe and Central Asia are in the high income category whereas there is only 1 country is in the lower middle income, which in the next code section we find is Ukraine.In addition, in the Middle East & North Africa we see some inequality as countries are either in the highest or lower category with only two in the middle. Finally, in East Asia& Pacific, countries are spread more equally.
\label{fig:fig10} Number of countries in Income Categories

Number of countries in Income Categories

We’re going to filter the data to investigate which country in Europe and Central Asia is in the lower income category.
country region income
Ukraine Europe & Central Asia Lower middle income

This map shows the GDP per capita and Market Capitalisations in every country globally.

CO2 and income

The first plot shows a violin plot of CO2 emissions per income category. We can see that the mean CO2 consumption doesn’t differ a lot between different categories, but as we move up the categories, the range of the distirbution increases. The second graph shows the relationship between CO2, GDP per capita, and income category. The coloured dashed lines show the mean value of CO2 emissions for each income category. The countries in the high-income category produce the most CO2 emissions. There are nine countries producing more than 15 metric tons per capita, whilst most of the countries’ emissions stay below 10 tons per capita. These graphs also give us the opportunity to see how GDP per capita is distributed in each income category. While the observations in the upper and lower income categories cover a narrow range and are gathered below 20,000 dollars, the values in the high-income category are more dispersed and cover a range that is more than triple that of the other two categories.The curve increases gradually until about 12 tons/capita and shows that as GPD increases, CO2 emissions increase at a slower rate.

## `geom_smooth()` using formula 'y ~ x'

Unemployement by gender

The following graphs show the unemployment rates for each gender in each country and region. In some countries, we observe a high inequality between genders, especially in United Arab Emirates, Saudi Arabia, and Bahrain etc. In general, unemployment rates for women are higher than those for men. The second graph, confirms that indeed, in the Middle East & North Africa, unemployment rate for females is significantly higher than it is for men. In addition, in Europe and Central Asia, there is some inequality between the two genders favouring the males. In East Asia and the Pacific, the rates for each gender are almost the same. In all regions, the female unemployment rate is higher except for North America where it seems the opposite is happening.
\label{fig:fig12} Unemployement by gender for each countiry and region

Unemployement by gender for each countiry and region

\label{fig:fig12} Unemployement by gender for each countiry and region

Unemployement by gender for each countiry and region

The maps show the unemployment rate by gender globally, for each country.

The figure below shows the unemployment rate by gender for each income category. For all income categories, the unemployment rate for females is higher than for males. The highest difference can be observed in countries classified as lower middle income and the lowest difference can be seen in the upper-middle-income category.
\label{fig:fig14} Unemployement by gender for each income category

Unemployement by gender for each income category

The figure below shows the relationship between the unemployment rate for each gender and GDP per capita for each income category. The colour of each point describes the income category, and the black dashed line shows the mean value of the unemployment rate for each gender. What stands out first in this graph, is the difference in mean values, where the mean unemployment rate for females is higher than for males. We can also see that the two trending lines follow the same trend for both genders, suggesting that as GDP increases, the rate decreases. However, the employment rate for males doesn’t seem to be affected by GDP per capita as much since it’s almost straight. Moreover, the trend line in the plot that represents females displays an increase, but the points seem to be in the same range. Finally, countries in the upper and lower-middle categories, with some exceptions, have the highest unemployment rates. It is surprising to see that some counties in the higher income category have the same unemployment rate as countries in the upper and lower-middle categories.

## `geom_smooth()` using formula 'y ~ x'
\label{fig:fig15} Unemployement by gender for each income category and GDP

Unemployement by gender for each income category and GDP

Life expectancy by gender

The first graph below shows life expectancy by gender for each country. It can be seen that for the majority of countries, life expectancy for both genders is above 50 years. The second graph shows life expectancy by gender for each region. We can observe that the life expectancy for males is higher than 75, apart from in Sub-Saharan Africa and South Asia, where it’s close to 65. In almost all regions, female life expectancy is higher than males . Finally, the last graph shows life expectancy by gender for each income category. There isn’t a significant difference between the categories, and again, life expectancy is higher for females.
Life.Expect.female Life.Expect.male
Min. :54.84 Min. :53.09
1st Qu.:77.59 1st Qu.:71.99
Median :79.94 Median :75.00
Mean :79.23 Mean :74.29
3rd Qu.:83.90 3rd Qu.:78.92
Max. :87.26 Max. :81.60
\label{fig:fig16} Life expectancy by gender

Life expectancy by gender

\label{fig:fig16} Life expectancy by gender

Life expectancy by gender

\label{fig:fig16} Life expectancy by gender

Life expectancy by gender

The graph shows the relationship between life expectancy for each gender and GDP per capita. The colour of each point depends on the market capitalisation percentage, and the two black dashed lines indicate the mean life expectancy of each gender. We observe that the two variables have a curvilinear relationship where when GDP per capita increases, life expectancy also increases, but at a different rate. In the beginning, it rises at a rapid rate, and then, beyond a point, the line flattens out. Life expectancy for males is lower in almost all countries than it is for women, which is shown by the lower mean value as well. We can also see that life expectancy increases in accordance with market capitalisation. The only exception is South Africa, where market capitalisation reached 300%. The life expectancy for women and men is approximately 68 and 60, respectively.

## `geom_smooth()` using formula 'y ~ x'
\label{fig:fig17} GDP and Life expevtancy for women

GDP and Life expevtancy for women

Mortality

The graph shows the relationship between female life expectancy, the mortality rate, and access to electricity. The colour of the points indicates the electricity access percentage, and the two dashed lines show the mean value of life expectancy and mortality. As we mentioned above, those variables are highly correlated and, indeed, we can observe an almost negative linear relationship. Most of the countries are gathered around the top left of the graph, and only a few points are scattered away. Thus, the mean value for female expectancy is around 80 years old, and the mean mortality is equal to 13 deaths per 1000 live births. We can also see that as the mortality rate increases, access to electricity decreases steadily.

## `geom_smooth()` using formula 'y ~ x'
\label{fig:fig18} Moratlity rate

Moratlity rate

The next map shows the mortality rate which is the deaths under 5 years per 1000 live births in each country.

Access to electricity

The plot below shows only countries with Access to Electricity less than 100%.We can see that all countries are categorized as either lower or upper middle income.

Arable Land Map

Education expenditure Map

In the next part we’re going determine any clustering behaviour, which will then be investigated further using various dimension reduction and clustering methods.